Categories

Versions

In Database Nest (In-Database Processing)

Synopsis

This meta operator allows you to define a subprocess that extracts data from a database.

Description

This operator translates the steps that its subprocess defines to SQL, and submits that code into the selected database. The operator specifies the database connection to use. Only operators from the In-Database Processing operator group can be used inside this meta operator. The operator has no input ports, but it can have arbitrary number of outputs. Each of those outputs delivers the in-memory data set result of the query generated by the operators connected to that port. If you enable parallelization, these queries may run concurrently in you database. The queries are submitted in the natural order of the output ports. The generated query is available as an Annotation for each data set and can be further processed with Annotations to Data or other Annotation operators.

There are predefined macros available inside the Nest to be used in parameters. Please check the following list.

  • process_start_db: process start date and time in a database-specific format based on the selected database type - note that the value could be a quoted literal, or a date function expression as well
  • t_db: current date and time in a database-specific format based on the selected database type - note that the value could be a quoted literal, or a date function expression as well
  • db_id: database type, that is, the SQL dialect used (either chosen manually or autodetected)

Input

  • connection (Connection)

Output

  • connection

  • example set 1

Parameters

  • connection entry Indicates how the database connection should be specified.
  • limit sample size Limit sample size. Must be empty (no limit) or a positive integer.
  • autodetect SQL dialect Detect the SQL dialect automatically from the connection settings. If unchecked, you must select the dialect in another parameter.
  • SQL dialect SQL dialect that the database uses. This can be automatically detected for supported databases by checking autodetect_SQL_dialect. For officially not supported databases, you must choose the dialect via this parameter explicitly.
  • parallelize Set to true to submit multiple queries concurrently to the database. Multiple Nest outputs require multiple queries, and setting this parameter to true makes them run in parallel. Note that this may increase the load on your source database significantly.
  • parallelization limit Maximum number of queries to submit to the database concurrently. Minimum is 2, but you can disable parallelization entirely by unchecking parallelize parameter.
  • parallelization delay Time in milliseconds to wait between submitting concurrent queries. The goal here is to avoid hitting API limits in case of cloud sources, or any other database limitations. Please consult your database administrator on what limitations apply in your environment.
  • allow all operators Set to true to allow using any type of operators inside the Nest, not just In-Database operators. Note that this option should only be considered as a last resort. These operators may not behave the originally intended way, their behaviour may depend on inner implementation details, and no backward compatibility is guaranteed if such operators are used inside the Nest. Also note that automatic replacement of non-In-Database operators when they are drag and dropped into the Nest is disabled if this parameter is set to true. Please only use this option if you know what you are doing, and ready to experiment.